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Abstract:  A  simple  and  effective  way  to  exploit  parallel  processors  in  discrete  event  simu¬ 

lations  is  to  run  multiple  independent  replications,  in  parallel,  on  multiple  processors  and  to 
average  the  results  at  the  end  of  the  runs.  We  call  this  the  method  of  parallel  replications 
This  paper  is  concerned  with  using  the  method  of  parallel  replications  for  estimating  steady- 
state  performance  measures.  We  report  on  the  results  of  queueing  network  simulation 
experiments  that  compare  the  statistical  properties  of  several  possible  estimators  that  can 
be  formed  using  this  method.  The  theoretical  asymptotic  properties  of  these  estimators 
were  determined  in  Glynn  and  Heidelberger  (1989a  and  1989b).  Both  the  theory  and  the 
experimental  results  reported  here  strongly  indicate  that  a  nonstandard  (in  the  context  of 
steady-state  simulation),  yet  easy  to  apply,  estimation  procedure  is  required  on  highly  par¬ 
allel  machines.  This  nonstandard  estimator  is  a  ratio  estimator.  The  experiments  also  show 
that  use  of  the  ratio  estimator  is  advantageous  even  on  machines  with  only  a  moderate 
degree  of  parallelism. 
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1.  Introduction 

A  simple  and  effective  way  to  exploit  parallel  processors  for  computationally  intensive  dis¬ 
crete  event  simulations  is  to  run  multiple  independent  replications,  in  parallel,  on  multiple 
processors  and  to  average  the  results  at  the  end  of  the  runs.  We  call  this  the  method  of  parallel 
replications.  This  paper  is  concerned  with  using  the  method  of  parallel  replications  for  estimating 
steady -state  performance  measures.  In  particular,  we  report  on  the  results  of  queueing  network 
simulation  experiments  that  compare  the  statistical  properties  of  several  possible  estimators  that 
can  be  formed  using  this  method.  The  theoretical  asymptotic  properties  of  these  estimators  were 
determined  in  Glynn  and  Heidelberger  (1989a  and  1989b).  Both  the  theory  and  the  experimental 
results  reported  here  strongly  indicate  that  a  nonstandard  (in  the  context  of  steady-state  simu¬ 
lation),  yet  easy  to  ?pply,  estimation  procedure  is  required  on  highly  parallel  machines.  This  non¬ 
standard  estimator  takes  the  form  of  a  ratio  estimator.  The  experiments  also  show  that  use  of  the 
ratio  estimator  is  advantageous  even  on  machines  with  only  a  moderate  degree  of  parallelism. 

We  remark  that  an  alternative  approach  to  parallel  processing  of  simulations  is  distributed 
simulation,  in  which  multiple  processors  cooperate  together  to  generate  a  single  realization  of  the 
stochastic  process  being  simulated.  For  an  excellent  introduction  to  distributed  simulation  and  a 
thorough  bibliography  on  this  topic,  see  Fujimoto  (1989).  A  theoretical  comparison  of  the  statis¬ 
tical  efficiencies  of  parallel  replications  and  distributed  simulation  for  estimating  steady-state 
parameters  may  be  found  in  Heidelberger  (1986). 

Intuitively,  when  using  the  method  of  parallel  replications  on  a  large  number  of  processors, 
one  expects  to  get  highly  accurate  estimates  after  only  a  relatively  short  amount  of  time. 
However,  there  are  some  potentially  serious  statistical  problems  inherent  in  this  approach,  and 
careful  estimation  procedures  must  be  applied  in  order  to  obtain  estimates  with  the  proper  (or 
desired)  statistical  properties.  These  problems  basically  arise  because  any  bias  effects  are  magni¬ 
fied  on  highly  parallel  machines,  i.e.,  because  of  the  bias,  one  obtains  highly  accurate  estimates  of 
the  wrong  quantity. 

In  the  context  of  estimating  transient  performance  measures  (or  steady-state  performance 
measures  in  regenerative  simulations),  these  problems  have  been  identified  and  addressed  in 
Heidelberger  (1988)  and  Glynn  and  Heidelberger  (1990).  These  papers  show  that  nonstandard 
estimators  are  required  on  highly  parallel  machines.  Other  issues  related  to  parallel  replications 
for  estimating  transient  quantities  are  described  in  Bhavsar  and  Isaac  (1987). 

For  estimating  steady-state  performance  measures,  the  traditional  approaches  (on  a  single 
processor)  are  to  use  either  the  method  of  batch  means,  or  independent  replications  with  initial 
transient  deletion  (see,  e.g.,  Law  (1977),  Law  and  Carson  (1979),  Law  and  Kelton  (1982),  or 


Bratley,  Fox  and  Schrage  (1987)).  When  using  replications,  it  is  generally  advised  to  use  only  “a 
few  long”  replications  (say  10  to  20)  with  deletion  to  reduce  susceptibility  to  the  effects  of  initial¬ 
ization  bias. 

With  the  prospect  of  parallelism  as  motivation,  Glynn  and  Heidelberger  (1989a  and  1989b) 
have  addressed,  from  a  theoretical  point  of  view,  how  one  should  control  the  number  of  repli¬ 
cations  (processors),  the  length  of  each  replication,  and  the  length  of  the  initial  transient  deletion 
interval  in  order  to  obtain  valid  central  limit  theorems  for  steady-state  parameters.  Such  central 
limit  theorems  can  then  be  used  as  the  basis  for  confidence  interval  formation.  These  papers, 
which  extend  the  single  processor  results  of  Glynn  (1987  and  1990),  show  that  valid  confidence 
intervals  can  be  obtained  even  for  a  very  large  number  of  processors  P  (relative  to  the  replication 
length)  provided  the  deletion  interval  grows  appropriately  and  the  proper  (nonstandard)  ratio  esti¬ 
mator  is  used. 

On  the  other  hand,  if  each  processor  is  run  for  a  prespecified  amount  of  computer  time  c, 
then  it  was  shown  that  initial  transient  deletion  does  not,  in  fart,  remove  the  dominant  term  in 
the  bias  expansion  (i.e.,  the  term  of  order  l/c)  of  the  traditional  (standard)  independent  repli¬ 
cations  estimator,  aj(P,  c).  In  this  case,  the  amount  of  simulated  time  generated  by  each  processor 
is  a  random  variable  (rv)  and  thus  the  traditional  estimator  becomes  a  ratio  estimator.  The  bias 
expansion  of  this  estimator  reveals  two  sources  of  bias  of  order  l/c: 

1.  “Initialization'’  bias,  i.e.,  bias  essentially  due  to  the  simulation  not  being  started  in  steady- 
state  conditions. 

2.  “Ratio”  bias,  i.e.,  bias  due  to  the  fart  that  the  denominator  of  the  ratio  estimator  is  a  rv. 

When  done  appropriately,  initial  transient  deletion  effectively  removes  the  initialization  bias. 
However,  initial  transient  deletion  does  not  remove  the  ratio  bias.  The  nonstandard  estimator, 
aR(P,  c),  corresponds  to  the  classical  ratio  estima^r  which  is  typically  used  in  sample  surveys  (see, 
e.g.,  Cochran  (1963))  and  regenerative  simulations  (see,  e.g.,  Crane  and  Iglehart  (1975)  or  Iglehart 
(1975)).  The  initialization  bias  (of  order  l/c)  in  <xR(P,  c)  is  the  same  as  the  initialization  bias  in 
aj<P,  c),  but  the  ratio  bias  in  a R(P,  c)  is  P  times  smaller  than  the  ratio  bias  in  aj{P,c).  Thus 
initial  transient  deletion  effectively  removes  all  bias  of  order  l/c  from  o iR{P,  c). 

The  net  effect  of  this  analysis  is  that,  when  using  the  ratio  estimator  aR(P}  c),  valid  confi¬ 
dence  intervals  for  steady-state  parameters  can  be  formed  when  very  highly  parallel  machines 
(large  P)  are  run  for  a  relatively  short  amount  of  time  (small  c/P).  In  this  situation,  valid  confi¬ 
dence  intervals  are  not  obtained  when  estimating  the  steady-state  parameter  by  the  traditional  esti¬ 
mator  ctj{P,  c).  Using  aj(P,c),  valid  confidence  intervals  are  only  obtained  when  c/P  is  very 
large,  i.e.,  when  the  length  of  each  replication  is  latge  with  respect  to  the  number  of  processors. 
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We  emphasize  that  aj(P,  c)  and  aR(P,  c)  both  make  use  of  exactly  the  same  underlying  data:  they 
merely  average  these  data  differently. 

The  purpose  of  this  paper  is  to  demonstrate,  experimentally,  that  this  dramatic  difference 
between  the  theoretical  asymptotic  behaviors  of  these  two  estimators  is  exhibited  in  sample  sizes 
that  are  not  unreasonable  in  practice.  In  simulations  of  simple  queueing  systems,  we  show  that 
noticeable  effects  (increased  bias  and  decreased  confidence  interval  coverage)  are  present  on  as  few 
as  32  to  64  processors.  Severe  effects  are  observed  on  128  or  more  processors.  Thus,  from  both  a 
theoretical  and  practical  viewpoint,  the  traditional  estimator,  aj{P,  c),  should  be  avoided  on. even 
moderately  sized  parallel  processors. 

The  rest  of  the  paper  is  organized  as  follows.  In  Section  2,  we  summarize  the  relevant  the¬ 
oretical  results  from  Glynn  and  Heidelberger  (1989a  and  1989b)  and  Glynn  (1987  and  1990).  In 
Section  3,  we  describe  the  queueing  models  that  we  used  for  experimentation.  In  Section  4,  we 
describe  the  design  of  the  experiments,  including  point  and  interval  estimation  procedures.  The 
results  of  the  simulation  experiments  are  presented  in  Section  5.  Finally,  Section  6  contains  a 
summary  of  our  findings,  a  discussion  of  their  relevance  to  traditional  steady-state  estimation  on 
single  processor  systems,  and  an  indication  of  related  future  research  topics. 

2.  Summary  of  Theoretical  Results 

The  results  that  we  quote  from  Glynn  and  Heidelberger  (1989a  and  1989b)  and  Glynn  (1987 
and  1990)  were  derived  und  ,;  reasonable,  yet  fairly  technical  assumptions.  These  basically  involve 
assumptions  concerning  the  existence  of  central  limit  theorems  and  their  associated  uniform 
integrability  (i.e.,  moment  convergence  in  the  central  limit  theorem),  an  exponential  convergence 
rate  to  the  steady-state  distribution,  and  certain  boundedness  conditions.  Since  a  precise  state¬ 
ment  of  these  conditions  would  be  rather  tedious  (and  not  particularly  illuminating  for  the  present 
purposes),  we  will  make  the  simplifying  assumption  that  the  process  being  simulated  is  an  irreduc¬ 
ible,  finite  state  space,  continuous  time  Markov  Chain  (CTMC)  with  state  space  denoted  by  E. 
Such  processes  automatically  satisfy  all  of  the  necessary  assumptions. 

We  let  {A'(r),  s  ^  0}  denote  the  CTMC.  The  parameter  s  denotes  simulated  time  so  that 
X(s)  is  the  state  of  the  process  at  simulated  time  s.  There  then  exists  a  rv  X  such  that  X(s)=>X 
where  =>  denotes  convergence  in  distribution.  We  call  X  the  steady-state  distribution  and  we  shall 
be  interested  in  estimating  quantities  of  the  form  a.  =  E[/(A)]  for  some  function  / 

There  are  P  processors.  Simultaneously  an  independent  simulation  of  the  CTMC  is  started 
on  each  processor.  We  let  Xt(s)  denote  the  state  of  the  process  at  simulated  time  s  on  processor  i, 
i—\,...,P.  Let  7~(c)  denote  the  simulation  time  on  processor  i  after  c  units  of  computer  tune. 
(The  discussion  in  this  paper  also  holds  if  c  is  measured  in  units  of  "wall  clock"  time,  or  for  that 
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matter,  any  other  way  of  measuring  time.)  Let  C^s)  denote  the  amount  of  computer  time 
required  on  processor  i  to  obtain  s  units  of  simulated  time. 

There  are  a  variety  of  ways  to  set  the  run  length.  We  will  consider  two  reasonable  and 
practical  approaches.  In  the  first  approach,  a  fixed  amount  of  simulated  time,  say  tP,  is  generated 
on  each  processor.  In  this  case,  the  completion  time  of  the  simulation  experiment  is 
C(tP)  =  max{C,(rP), ... ,  Cp(tp)),  which  is  a  rv.  Since  we  can  view  {C,{r),  s  ^  0}  as  a  cumulative 
process,  it  is  reasonable  to  assume  that  each  Ct{tP)  obeys  a  central  limit  theorem,  i.e.  there  exist 
finite  positive  constants  X  and  <r,  such  that 


C,{tp)  —  tPl  X 

J~h> 


a,  iV(0,l) 


(2.1) 


where  /V(0,1)  denotes  a  normally  distributed  rv  with  mean  zero  and  variance  one.  The  parameter 
X~]  is  the  long  run  rate  at  which  computer  time  is  expended  per  unit  of  simulated  time.  Alterna¬ 
tively,  X  is  the  long  run  rate  at  which  simulated  time  is  generated  per  unit  of  computer  time. 
Since  the  completion  time  is  the  maximum  of  iid  (independent  and  identically  distributed)  rvs  that 
are  approximately  normally  distributed,  the  expected  completion  time  is  approximately  equal  to 
(tp/X)  +  a^Jllp  \n{P)  provided  tp  and  P  —  oo  appropriately.  In  this  expression,  (tp/X)  is  the 
expected  completion  time  of  an  individual  processor  and  <rlv/2/P  ln(P)  is  the  additional  time  until 
the  last  processor  finishes.  The  factor  ln(P)  arises  as  the  maximum  of  P  iid  N(0,1)  rvs,  which 
then  gets  multiplied  by  the  standard  deviation  of  an  individual  completion  time,  axJT^ .  Notic~ 
that  if  <r,  =  0,  then  C;(/P)  =  tpjX,  i.e.,  computer  time  is  deterministically  proportional  to  simulateu 
time  and  there  is  no  completion  time  penalty.  Since  the  holding  time  in  a  state  (in  simulated  time 
units)  is  a  rv  and  since  the  amount  of  work  (computer  time)  to  generate  a  transition  may  depend 
on  the  state  of  the  system  (e.g.,  the  time  to  put  an  event  on  the  future  event  list  typically  grows 
with  the  length  of  the  list),  we  view  such  proportionality  as  the  exception,  rather  than  the  rule. 
Thus,  in  general,  the  completion  time  penalty  grows  as  j2tP  In (P)  which  is  clearly  undesirable. 

In  the  second  approach,  we  stop  each  simulation  at  exactly  the  same  computer  time,  c.  In 
this  approach,  the  completion  time  of  the  experiment  is  deterministic,  but  the  amount  of  simu¬ 
lated  time  generated  on  each  processor,  Tt(c),  is  now  a  rv.  Note  that  T^c)  =  sup{  t  ^  0  :  C^t)  <,  c), 
so  that  (T}(c),  c^O}  is  the  inverse  process  of  (C/t),  f  ^  0).  We  will  assume  that  C^t)  can  be 
represented  as  an  integral,  i.e., 

Cj(t)  =  f  x(A )(s))  ds  where  0  <  x(f)  <  oo  for  ally  e  E.  (2  2) 
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It  is  well  known  that  such  CTMCs  satisfy  a  bivariate  central  limit  theorem: 


vr 


m 

nm)  & 

> _ 

T,{c) 


-  a  , 


m 

c 


=>  N(0,A) 


(2.3) 


where  N(0,A)  denotes  a  bivariate  normally  distributed  random  vector  with  means  zero  and 
covariance  matrix  A.  In  fact,  a  slightly  stronger  version  of  this  central  limit  theorem  is  valid  (and 
required),  namely  a  functional  central  limit  theorem  version  of  Equation  2.3.  See  Billingsley 
(1968)  for  a  discussion  of  functional  central  limit  theorems.  In  practice,  this  is  not  a  restriction. 

We  next  assume  that  all  the  data  generated  in  the  first  k(c)  units  of  computer  time  are 
deleted  for  the  purpose  of  reducing  initialization  bias.  We  assume  that  k(c )  is  a  deterministic 
quantity.  Define  yt(c)  =  7,(k(c))  to  be  the  (random)  simulated  time  at  which  processor  i  begins 
collecting  data  for  steady-state  estimation  and  let 

fnc) 

YA)  =  Am)  ds  .  (2.4) 

Jv,(c) 


The  length  of  the  interval  in  which  it  is  assumed  that  the  process  is  "in  steady-state”  is  then 
r,(c)  s  Tj(c)  —  y,(c).  The  traditional  steady-state  simulation  estimator  of  a  (assuming  we  were  sim¬ 
ulating  on  a  single  processor  and  associating  the  index  i  with  an  independent  replication)  is 


<*TiP’  c ) 


p 


m 

T,(C) 


(2.5) 


The  above  estimator  is  employed  in  most  simulation  packages  and  languages  when  steady-state 
estimation  is  performed  using  the  method  of  independent  replications  with  initial  transient 
deletion.  However,  the  ratio  form  of  a.j{P,  c)  immediately  suggests  an  alternative  estimator,  which 
is  more  suitable  for  ratio  estimation: 

p 

Z  m 

*R(P'C)  =  — -  =  —  (2.6) 

i 

_  p  p 

where  Y(c)  =  (1  IP)  £  l'(c)  and  7(c)  =  (1  IP)  £  t ;(c).  Other  simulation  contexts  in  which  such 

i *  1  1-1 

ratio  estimators  have  been  considered  are  regenerative  simulation  (identify  r,(c)  and  K,(c)  as  the 
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length  of  the  /-th  regenerative  cycle  and  an  integral  over  the  /'-th  cycle,  respectively)  and  the 
method  of  batch  means  (identify  -r t{c)  and  Y^c)  as  the  length  of  the  /'-th  batch  and  an  integral  over 
the  i-th  batch,  respectively  -  see  Fox  and  Glynn  (1987)). 

We  begin  by  stating  bias  expansions  for  a.j{P,  c)  and  *R(P,  c)  when  c  =  cP  and  we  let  P  and 
Cp  — *  oo  together.  We  first  consider  the  case  when  no  initial  transient  deletion  is  performed,  i.e., 
k(cp)  -  y^Cp)  —  C.  The  conditions  stated  above  are  sufficient  to  guarantee  the  following 
asymptotic  bias  expansions: 


E[«  AP.cpy]  =  «  +  Tf  +  o(l  lcP) 
E  laR(P'Cp)~\  =  a  +  ~  +  0(1  lcp) 


(2.7) 


where 


bR  =  a  . 


(2.8) 


The  expansion  for  txp{P,c)  was  derived  in  Glynn  (1990)  while  the  expansion  for  aR{P,c)  was 
derived  in  Glynn  and  Heidelberger  (1989b).  The  precise  form  of  the  constant  a  is  given  in  these 
papers.  Roughly  speaking,  we  can  think  of  a  as  “initialization”  bias,  i.e.,  bias  because 

rm 

E[  (/m)  -  a)  ds]  #0.  (2.9) 

Jo 


The  traditional  estimator  contains  an  extra  bias  term,  -  Anl^,  which  can  be  thought  of  as  ratio 
bias,  i.e.,  bias  because  the  denominator  of  the  ratio  is  a  rv. 

To  see  why  the  bias  expansions  of  ctj{P,  c)  and  aR(P,  c)  differ,  we  give  the  following  bnef 
heuristic  arguments  (which  are  made  rigorous  in  the  above  mentioned  papers).  Notice  that  both 
E [aj{P,  Cp)]  and  Efa^P,  cf)]  can  be  written  as  E [A(cP)IB(cp)].  Now  let 
c(cP)  =  ( B(cP )  -  E[fi(c/>)])/E[5(cp)]  and  write 

A{cP)  _  A(cP)  A  (Cp)  ( 1  -  c(cP)  +  c(cp)1 ... ) 

B(cP)  ~  E[fl(c/,)](l  +  e(cp))  *  E[B(cP)] 


Taking  expectations  of  Equation  2. 10  yields 
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A{cP)  _  E[/j(c>)]  _  Cov[/4(cp),i?(cp)] 

B(cP)  _  E[5(cp)]  E[5(cp)]2 


For  both  aj{P,  cP)  and  aR(P,  cP),  the  initialization  bias  term  a  arises  from  the  fact  that 
E[/4(cp)]/E[5(cp)]  =  E[  yXc/OJ/Ef-r^Cj))]  ¥=  a.  The  ratio  bias  arises  from  the  covariance  term  in 
Equation  2.11.  For  aj{P,Cp),  E[.3(cp)]  =  ECx^Cp)] » X  cP  and 

Cov[/l(cp),2?(cp)]  =  Cov[K/cp),  r,{cp)]  2:  cp  /.  A ]2  by  the  central  limit  theorem  in  Equation  2.3 
(and  its  uniform  integrability).  Thus  the  ratio  bias  for  a f{P,Cp)  is  -  A]2/(lcP)  as  stated.  For 
aR(P,  cP),  the  ratio  bias  is  reduced  by  a  factor  of  P  since 


Cov[A(cp),B(cP)]  =  Cov[K(cp),  ?(cp)] 


Cov[y,<Cp).  r,{cp)]  cPXAn 
P  *  P 


(2.12) 


Combining  Equations  2.11,  2.12  and  the  expression  for  E[5(Cp)]  shows  that  the  ratio  bias  of 
aR(P ,  cp)  is  0(1  IPcP)(  =  o(l/cP)  as  P  -*  oo). 

The  effect  of  the  bias  expansions  of  Equation  2.7  is  that,  without  deletion,  a.j{P ,  cP)  and 
aR(P,  cp)  obey  the  following  central  limit  theorems: 

Theorem  1 

Let  {A](.r),  s  >  0}  be  iid  samples  of  an  irreducible,  finite  state  space  CTMC  satisfying  Equations 
2.2  and  2.3.  Define  a2  -  Au  and  let  *(cp)  =  y,(Cp)  =  0.  As  P  -*  oo, 

1.  If  P/cp  — ►  oo,  Cp  — *  oo  and  bp&  0,  then  J Pep  I  aj(P,  cp)  —  a  I  =>oo 

2.  I fPlcP-*m  (0  <  m  <  oo)  and  bT^  0,  then  JPcP  (aj(P,  cP)  —  a)=>/V(0,  a2)  +  bTm 1/2. 

3.  If  P/cp  -*  0,  then  JPcP  (a  j{P,  cP)  -  a)=»Af,(0,  a2) 

Theorem  2 

Theorem  1  is  also  valid  for  o.R{P,  cP)  with  bR  replacing  bT. 

Theorems  1  and  2  imply  that,  without  deletion,  one  must  let  P/cP  -*  0  in  order  to  obtain 
valid  confidence  intervals  for  a,  i.e.,  the  length  of  each  replication  must  be  large  with  respect  to 
the  number  of  replications  (processors). 

We  next  consider  the  case  of  asymptotically  negligible  deletion,  i.e.,  Kp(cp)  -*  oo  but 
k p(cp)/cp  -*  0.  In  this  case,  it  is  shown  in  Glymi  and  Heidelberger  (1989b)  that 

E[a7f/>,  Cp)]  =  a  +  +  0(1 /Cp) 


L[as(P,  Cp)]  =  a  +  0(1 /Cp) 


(2.13) 


where 


dT  =  ■  (2- 14) 

Equations  2.13  and  2.14  imply  that,  for  a^P,  cP),  initial  transient  deletion  is  effective  in  removing 
initialization  bias,  but  does  not  remove  ratio  bias  (unless  An  =  0  in  which  case  simulated  time 
and  computer  time  are  deterministically  proportional).  The  effect  of  this  bias  expansion  on  the 
central  limit  theorem  for  a-jiP,  cP)  is  that  valid  confidence  intervals  will,  again,  only  be  obtained  if 
P/Cp  -*  0.  On  the  other  hand,  initial  transient  deletion  removes  all  sources  of  bias  of  order  1  !cP 
from  the  bias  expansion  of  &R(P,  cp).  This  will  permit  a  valid  central  limit  theorem  for  aR(P,  cP) 
even  if  P/cP  ->  oo  provided  the  length  of  the  deletion  interval  does  not  grow  too  slowly. 

Theorem  3 

Let  s  >  0}  be  iid  samples  of  an  irreducible,  finite  state  space  CTMC  satisfying  Equations 

2.2  and  2.3.  Assume  kp(cp)/cp  -*  0.  As  P  —  oo, 

1 .  If  P/cP  —  oo,  cP  -*  oo  and  dT 0,  then  JPcP  ! <*t(P,  cP)  —  a\  =>oo 

2.  If  P/cp  — »  m  (0  <  m<  oo)  and  dT^  0,  then  JPcp  (<*j{P.  cP)  -  a)^N(Q,  a2)  +  dTm]l 2. 

3.  If  P/cp  -»  0,  then  JPcp  (<*j{P,  cP)  -  a)^N{0,  a1) 

Theorem  4 

Let  {Kj(j),;^0}  be  iid  samples  of  an  irreducible,  finite  state  space  CTMC  satisfying  Equations 
2.2  and  2.3.  Assume  KP(cP)jcp  -*  0.  As  P  -*  oo,  if  either 

1.  Plcp  -*  oo  and  Kp(cp)l  ln(P)  -*  oo,  or 

2.  P/cp  -♦  m  (0  <  m  <  oo)  and  kp(cp)  -*  oo,  or 

3.  Plcp  -  0 
then 

■J Pc p  ( <xR(P ,  cP)  -  a)=>/V(0 ,a2)  .  (2.15) 

The  \n{P)  term  in  part  (1)  of  Theorem  4  arises  because  finite  state  space  CTMCs  converge 
exponentially  fast  to  their  steady-state  distribution.  As  indicated  earlier,  the  bias  expansions  and 
Theorems  1  -  4  are  valid  under  more  general  conditions.  Basically,  one  needs  a  functional  version 
of  the  central  limit  theorem  in  Equation  2.3,  uniform  integrability  of  second  moments  in  this  joint 
central  limit  theorem,  exponential  convergence  to  steady-state,  and  some  sort  of  regularity  condi¬ 
tions  on  C,(t)  and  (Aj-(j),  r^0}.  In  Glynn  and  Heidelberger  (1989b),  it  was  assumed  that 
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C/(f)  =  f0Xi(s)ds  where  xi(s)  >s  bounded  and  that  s  ^  0}  is  a  bounded  regenerative  process. 

The  regenerative  assumption  is  not  really  as  restrictive  as  it  might  seem  since  the  estimation  pro¬ 
cedures  do  not  make  use  of  the  regenerative  structure.  It  is  mainly  used  as  a  proof  device,  and  in 
addition,  many  stochastic  processes  possess  a  (hidden)  regenerative  structure  (see,  e.g.,  Glynn 
(1989)).  We  further  believe  the  result  to  be  true  for  more  general  cumulative  processes 

t  2:  0}  where,  e.g.,  Ct{t)  is  discontinuous. 

3.  Queueing  Models  Used  for  Experimentation 

In  this  section,  we  describe  four  queueing  models  that  we  used  for  determining,  exper¬ 
imentally,  the  behavior  of  aj{P,c)  and  aR(P,c).  These  represent  simplified  versions  of  models 
(with  analytically  tractable  solutions)  that  often  arise  in  simulations  of  computer  or  communi¬ 
cations  systems.  We  ran  experiments  on  the  waiting  time  process  in  an  M/M/1  queue  and  on 
three  CTMCs:  the  queue  length  processes  in  an  M/M/1  queue  with  feedback,  a  open  Jackson 
network  and  a  closed  product  form  network  (see,  e.g.,  Kleinrock  (1975)). 

For  the  M/M/1  waiting  time  simulations,  we  let  <j>  be  the  arrival  rate,  p  be  the  service  rate, 
and  p  =  4>lp  be  the  traffic  intensity.  Let  Wn  be  the  waiting  time  of  the  n-th  customer.  For  pel, 
Wn=>W.  The  performance  measure  of  interest  is  a  =  E[W]  =  p/[p(l  —  p)]. 

For  the  M/M/ 1  queue  with  feedback,  we  let  4>  denote  the  arrival  rate,  p  the  service  rate  and 
p  the  feedback  probability.  The  expected  number  of  visits  a  customer  makes  to  the  queue  is 
1/(1  —  p)  and  the  traffic  intensity  is  p  =  <f>/[p(l  —  p)~\.  We  let  Q(s)  denote  the  queue  length  at 
(simulated)  time  s,  including  the  customer  in  service.  Then  Q{s)=»Q  as  s  -*  oo  provided  p  <  1.0. 
The  output  performance  measure  of  interest  is  the  steady-state  mean  queue  length, 
a  =  E [Q]  =  p/(l  —  p).  We  set  <j>  =  1,  p  =  20,  and  p  =  0.9,  so  that  p  =  0.50  and  a  =  1.0.  We  ran 
experiments  with  two  sets  of  initial  conditions:  2(0)  =  0  and  2(0)  =  5. 

A  diagram  of  the  open  Jackson  network  is  shown  in  Figure  1.  This  network  is  sometimes 
called  an  open  central  server  model  (see  Buzen  (1973))  with  server  0  representing  a  CPU  (central 
processing  unit),  and  servers  1  to  4  representing  I/O  devices.  There  is  a  single  type  of  job.  Jobs 
arrive  to  the  network  (at  the  CPU)  according  to  a  Poisson  process  with  rate  4>.  All  servers  operate 
using  the  FCFS  service  discipline  and  the  service  times  of  jobs  at  server  i  are  iid  exponentially 
distributed  rvs  with  mean  s,.  When  a  job  leaves  the  CPU,  it  goes  to  I/O  device  i  with  probability 
Pi  (1  <,  i£  4),  and  when  a  job  leaves  an  I/O  device,  it  goes  to  the  CPU  with  probability  p$  and 
exits  the  system  with  probability  (1  —  p0).  Let  Qt{s)  denote  the  queue  length  at  server  i  at  time  r 
(including  the  customer  in  service)  and  let  p(  denote  the  traffic  intensity  at  server  i.  Provided 

Pi  <  1,  then  (2o(J).  .  Qiity^iQo . Qi)  as  s  —  oo.  Under  the  above  assumptions,  the  steady- 

state  distribution  of  ( Q0 ,  ...  ,  2a)  has  a  product  form,  and  in  particular  E[2J  =  p,/(l  -  p,)  The 
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output  performance  measure  of  interest  is  a  =  E[Q0].  We  set  <f>  =  1.0,  p^  =  0.75,  pt  =  0.25  for 
1,  Jq  =  0.1875,  and  rz  =  0.50  for  i  ^  1.  With  these  parameters,  p0  —  0.75,  ^  =  0.50  for  is  1, 
a  =  E[Qg]  —  3.00,  and  E[QJ  =  10  for  is  1.  This  model  was  simulated  with  initial  conditions 
Qt{0)  =  0  for  all  i. 

The  closed,  product  form  queueing  network  model  is  shown  in  Figure  2.  This  model  is 
sometimes  called  a  closed  central  server  model.  Again  server  0  represents  a  CPU  and  servers  1  to 
4  represent  I/O  devices.  There  are  a  fixed  number  of  jobs  N  circulating  in  the  network.  As  in  the 
open  model,  the  service  discipline  is  FCFS  at  all  servers  and  we  assume  iid  exponentially  distrib¬ 
uted  service  times  with  mean  rz  at  server  i.  When  a  job  leaves  the  CPU,  it  goes  to  I/O  device  / 
with  probability  pL  and  when  a  job  leaves  an  I/O  device  it  goes  back  to  the  CPU.  Let  Qt(s) 
denote  the  queue  length  at  server  i  at  time  s  (including  the  customer  in  service)  and  let  p,-  denote 
the  steady-state  utilization  of  server  /.  Then  ( Q0(s ), ...  ,  Qi(s))=>(Q0, ...  ,  Q^)  as  s  -*  oo.  The  per¬ 
formance  measure  of  interest  is  a  =  E[£)0].  W’e  set  N  =  10,  pt  =  0.3  for  1  <  i  <  3,  p4  =  0. 1,  r0  =  1.0, 
st  =  2.0  for  l<i<3,  and  r4=11.0.  With  these  parameters,  p0  =  0.82,  p,- =  0.49  for  1  <  i <  3, 
p4  =  0.90,  cl  =  E[£>o]  =  3.06,  E[(9/]  =  0.92  for  1  <  i  <  3,  and  =  4.17.  This  model  was  simulated 
with  initial  conditions  Q0( 0)  =  6  and  (3,(0)  =  1  for  /  >  1. 

Because  we  did  not  have  convenient  access  to  a  very  highly  parallel  machine,  all  exper¬ 
iments  were  run  on  a  single  processor.  The  effect  of  running  parallel  replications  on  multiple 
processors  with  a  computer  time  stopping  constraint  was  simulated  as  follows.  For  the  queue 
length  processes,  we  assumed  that  each  event  (external  arrival  or  service  completion)  took  one 
unit  of  computer  time  to  process.  Thus  C/(l)  is  the  number  of  events  completed  at  simulated  tune 
i  (on  replication  i)  and  T,(c)  is  the  amount  of  simulated  time  generated  after  processing  c  events. 

For  the  M/M/1  waiting  time  simulations,  we  let  Ct{t)  be  the  arrival  time  of  customer 
number  i  and  Tt{c)  be  the  number  of  customers  that  arrive  in  the  interval  (0,c).  In  this  example, 
the  integrals  in  Equations  2.3  and  2.4  get  replaced  by  sums.  Using  these  definitions  for  C,(f)  and 

_  n  —  1 

T,(c)  allows  the  ratio  bias  term  dT  to  be  calculated  analytically,  as  follows.  Let  Wn  =  X  be 

_  n  k » 0 

the  average  waiting  time  of  the  first  n  customers  and  let  An  =  £  Ak/n  be  the  average  interarrival 

_  *  «  l 

time  of  the  first  n  customers.  Note  that  A„  =  Cz(n)/n.  Since  7}(n)/n  -» <f>  (the  arrival  rate),  in  this 
example  we  have  <f>  -  k.  We  also  have 

V^T  (  Wn  -  a.  ,  An  -  r1  )  =>  N(0,B)  (3.1) 

for  some  covanance  matrix  B.  By  Theorem  5  of  Glynn  and  Heidelberger  (1989b),  B  and  A  (the 
covariance  matrix  in  Equation  2.3)  are  related  by  Au  =  BUIX,  A]2  =  —  and  A22  -  X3B22 
Thus  by  Equation  14,  dT=  B]2  (  =  -  A]2jk).  All  the  terms  of  B  can  be  explicitly  calculated  (wc 
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set  p=l):  Bu  =  p[2  -f  5p  —  4p2  +  p3]/(i  -  p)4  (see,  e.g.,  Blomqvist  (1967)  or,  more  recently, 
V  mtt  (1989)),  B22  =  Varf/f^]  =  1/A  ,  and,  by  using  regenerative  process  theory  (see,  e  g  ,  Crane 
and  Iglehart  (1975)) 


B 


12 


Cov[  (F,  —  a/V|)  ,  (/)  -  A~'/Vj)  ] 
F.[A'] 


(3.2) 


where  Yt  is  the  sum  of  the  waiting  times  in  the  i-th  regenerative  cycle,  r,  is  the  sum  of  the  interar- 
nval  times  in  the  i-th  cycle  and  is  the  number  of  arrivals  in  the  i-th  cycle  Now 

F.[/V,]  =  1/(1  —  p),  while  the  covanance  term  in  Equation  3.2  can  be  calculated  from  results  con¬ 
tained  in  Lavenberg,  Moeller  and  Sauer  (1977)  (the  research  report  version  of  Lavenberg,  Moeller 
and  Sauer  (1979)).  (This  covanance  term  arises  in  the  context  of  control  variables  for  simulation 
vanance  reduction.)  Specifically,  for  M/G/l  queues 

Cov[(  Yj  -  a.N')  ,  (it j—  NJX)]  =  —  62/[2(l  -  p)  ]  where  1S  second  moment  of  the  service 
times  Since,  for  M/M/1  (with  p=  1)  =  2,  Equation  3.2  reduces  to  dT  =  fl12  =  —  1/(1  —  p)1 . 

The  effect  of  ratio  bias  on  confidence  interval  coverage  can  now  be  calculated  analytically. 
Let  d>(jc)  =  Pf,V(0, 1)  ^  x}  and  defrne  z4/2  by  <£>(z6i2)  =  1  -  <5/2.  From  part  2  of  Theorem  3,  if 
P\cp  =  m,  and  /?  =  dTJmjAu  ,  then 

zz6t7)  *  P{|/V(O,1)  +  0I  <z6l2)  ^ 

=  ®(Z6I2  ~  P)  ~  ~  Z&i2  -  P)  . 


Thus  for  any  given  p,  P,  and  cP,  the  actual  coverage  of  presumed  100  x(l  -  d)%  confidence 
intervals  can  be  predicted.  In  Section  5,  we  will  compare  the  predicted  coverage  with  the  actual 
coverage  observed  in  simulation  experiments.  Note  that  by  usmg  the  heavy  traffic  approximation 
Bu  ^4p/(l  —  p)4  (see  Whitt  (1989))  we  obtain  /?  «  This  approximation  also  works 

well  for  moderate  values  of  p.  Thus  for  given  P  and  cP,  we  expect  the  loss  in  coverage  due  to 
ratio  bias  to  be  approximately  independent  of  the  traffic  intensity  (provided  p  is  not  too  small  and 
P  and  cP  are  large  enough  that  the  central  limit  theorem  is  valid).  This  behavior  will  be  observed 
in  Section  5. 

4.  Design  of  the  Simulation  Experiments 

In  this  section  we  describe  how  the  simulation  experiments  were  performed.  As  mentioned 
earlier,  the  effect  of  running  parallel  replications  on  multiple  processors  with  a  computer  time 
stopping  constraint  was  simulated  on  a  single  processor.  For  the  various  models,  and  different 
values  of  P,  c  and  k(c),  we  were  interested  in  estimating  the  mean,  vanance  and  confidence 
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interval  coverage  of  aj{P,  c)  and  ctR(P,  c).  We  built  a  simple  queueing  network  simulator  suitable 
for  these  purposes.  (We  used  the  combined  generator  described  in  L'Ecuyer  (1988)  as  a  source  of 
random  numbers.)  To  estimate  these  quantities  for  given  values  of  P,  c,  avd  k(c),  M  "super  repli¬ 
cations”  were  performed  where  each  super  replication  consisted  of  P  replications,  each  of  length  c 
and  having  truncation  irterval  x(c).  Thus  for  super  replication  j  (1  <,j  <>  M)  samples  aj{P,c,j)  and 
aR(P,c,j)  of  njiP,c)  and  aR(P,c),  respectively,  were  obtained  (according  to  Equations  2.5  and  2.6). 
E[«t(P,c)]  and  E[as(/5,c)]  were  estimated  by  a j(P,c)=  £  *7{P<c,j)IM  and 

M  ('  =  1 

aR{P,c)=  X  aR{P> respectively.  The  sample  standard  deviations,  Sj{P,c)  and  SR(P,c)  of 

i  =  1 

a.j{P,c)  and  aR(P,c),  respectively,  were  computed  in  the  usual  way,  e.g. 
Sj<P,c)  =  I  (ariP.c,/)  -  tT{P,c))2l[M(M  -  1)]. 

/=  l 

On  each  super  replication  we  also  obtained  asymptotic  standard  deviation  estimates 
oj(P,c,j)  and  aR(P,c,j)  for  aj{P,  c)  and  aR(P ,  c),  respectively.  These  were  estimated  as  follows. 
Let  Y,(c,j)  and  t ,(c,/)  be  the  samples  of  Yt(c)  and  r,(c)  obtained  on  the  y-th  super  replication. 
Then 


fyP.cj)  = 


c 

X 

1  =  1 


K<c,D 

T,(c,y) 


-  a.j{P,C,j) 


p-  1 


(4.1) 


Computation  of  oR(P,c,j)  is  analogous  to  variance  estimation  in  regenerative  simulation: 

P 

y  Z  (^c’»2  -  2ag(P,c,j)  Yfaji^c, j)  +  aR(P,CJ-)2^c,j)2) 

1 - 


o2r{Px,D  = 


i  =  1 


\ 


(4.2) 


i  Z«./> j 


rrom  these  point  and  vanance  estimates,  presumed  100  x  (1  —  <5)%  confidence  intervals  for  a  can 
be  formed  as  follows.  Using  the  traditional  estimator  the  confidence  interval  is 
±  ts,2(P-  \)Zj(P,c,j)lJP  where  t6j7{P  -  \)  is  defined  by 

1  —  <5/2  =  P{TP_  ,  <1  t6f7(P  -  1)}  and  TP_  ,  has  a  Student's  t  distribution  with  (P  -  1)  degrees  of 
freedom.  Using  the  classical  ratio  estimator  the  confidence  interval  is 

<*r(P>c<D  ±  Z6I2°r(P’C’J)I'J~P  ■  For  a  given  estimator,  we  define  its  coverage  to  be  the  fraction  of 
these  confidence  intervals  that  actually  contain  a.  If  valid  confidence  intervals  are  being  formed 
for  a,  then,  by  definition  the  coverage  should  converge  to  (l-<5)  as  A/-*  oo.  In  all  cases  we  set 
(5  =  0.1  corresponding  to  90%  confidence  intervals. 
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The  simulator  was  organized  in  such  a  way  that  statistics  could  be  collected  for  multiple 
values  of  P,  c  and  k(c)  from  the  same  set  of  runs.  Thus  the  data  generated  for  a  particula1-  model 
are  correlated.  We  took  values  of  P  to  be  powers  of  two,  ranging  from  P  =  8  to  P  -  512  for  the 
CTMCs  and  P=  128  to  P  =  1024  for  the  M/M/1  waiting  time  simulations.  Each  super  repli¬ 
cation  for  P  processors  also  comprised  2  super  replications  for  P/2  processors,  4  super  replications 
for  P/4  processors,  etc.  We  used  200  super  replications  for  the  largest  values  of  P  in  each  case. 
Thus,  e  g.,  12,800  super  replications  of  the  CTMCs  were  obtained  fpr  P=  8.  These  sample  sizes 
were  generally  large  enough  so  that  very  accurate  point  estimates  were  obtained. 


5.  Experimental  Results 

The  first  set  of  experiments  are  for  the  M/M/1  waiting  times.  The  purpose  of  these  exper¬ 
iments  is  to  compare  the  analytic  results  of  Section  3  to  actual  simulation  results.  To  isolate  just 
the  effect  of  the  ratio  bias,  these  simulations  were  started  in  the  steady-state  distribution.  We 
simulated  until  (simulated)  time  c=  1,000.  By  deleting  customers  arriving  before  times  k(c)  =  100, 
250  and  500,  we  obtained  runs  of  effective  lengths  c  —  k(c)  =  900,  750  and  500,  respectively.  We 
simulated  at  p  —  0.50  and  p  =  0.75. 

The  results  of  these  experiments  are  listed  in  Table  1.  Table  1  lists  the  predicted  coverages 
for  aj{P,  c )  as  calculated  by  Equation  3.3  (using  t.  ffective  run  length  for  c  in  that  equation),  as 
well  as  the  actual  coverages  for  ^/{P,  c)  and  a R(P,  c)  observed  in  the  simulations.  Table  1  indi¬ 
cates  generally  excellent  agreement  between  the  predictions  and  the  experiments.  Notice  that,  for 
given  P  and  k(c),  the  predicted  and  actual  coverage  for  a^P,  c)  is  quite  insensitive  to  the  value  of 
p,  as  explained  in  Section  3.  In  addition,  for  fixed  k(c),  as  P  increases  the  coverage  for  aj{P,  c) 
decreases.  This  is  in  agreement  with  part  2  of  Theorem  3  and  is  explained  by  the  fact  that  as  P 
increases,  increasingly  accurate  estimates  of  (the  biased)  E[aj{P,  c)]  are  obtained.  This  loss  in 
coverage  is  greatest  for  the  largest  value  of  k(c)  since  that  corresponds  to  the  smallest  effective  run 
length.  On  the  other  hand,  the  coverage  for  aR(P,  c)  stays  close  to  its  nominal  value  of  0.90. 

Figures  3  to  5  plot  results  from  simulations  of  the  M/M/1  queue  v  th  feedback.  Figure  3 
plots  5 -j{P,c)  and  aR{P,c)  as  a  function  of  k(c)  for  c=  1000  events,  P=  512,  and  two  different 
initial  queue  lengths.  Actually,  when  Q( 0)  =  0,  a j{P,  c )  appears  almost  unbiased  without  trun¬ 
cation  (k(c)  =  0),  but  aj(P,c)  increases  above  the  .steady  sta„e  value  of  a  =  1.0  as  k(c)  increases. 
In  this  case,  the  initialization  bias  and  ratio  bias  are  of  opposite  signs  and,  in  effect,  approximately 
cancel  each  other  out  when  k(c)  =  0.  When  2(0)  =  5,  a.j{P,c)  decreases  as  k (c)  increases,  but, 
again,  does  not  come  close  to  «  For  k(c)  =  500  the  values  of  *7 <P,c)  are  very  nearly  the  same  for 
both  2(0)  =  0  and  Q(0)  =  5,  but  are  about  8%  above  the  steady-state  value.  On  the  other  hand, 
aR(P,c)  approaches  a  as  *(c)  increases  for  both  2(0)  =  0  and  Q(0 )  =  5.  These  point  estimates  are 
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very  accurate.  For  example,  when  Q( 0)  =  0  and  k(c)  =  500,  a^S^.c)  =  1.084,  St<512,c)  =  0.002, 
«*(512,c)  =  1.002  and  Ss(5I2,c)  =  0.002. 

Figure  4  plots  the  coverages  for  these  estimators  (without  deletion)  as  a  function  of  P. 
Because,  by  coincidence,  Efa^P,  c)]»a  when  ^(0)  =  0  and  k(c)  =  0,  the  coverage  for  aj(P,c) 
remains  at  or  near  the  nominal  value  of  0.90.  However,  the  coverage  for  aj(P ,  c)  decreases  (to 
zero)  as  P  increases  when  Q( 0)  =  5  because  of  the  stronger  initialization  bias.  Similarly,  because 
of  initialization  bias,  the  coverage  for  a. R(P,  c)  is  seriously  degraded  for  both  Q{ 0)  =  0  and 
2(0)  «  5. 

Figure  5  shows  the  coverages  when  k(c)  =  250.  With  this  value  of  k(c),  the  initialization  bias 
is  essentially  eliminated,  although  ratio  bias  is  still  present:  for  example,  when  (2(0)  =  5, 
«*(512,c)  =  1.004  compared  to  a  =  1.0  while  a^(5l2,c)  =  1.060.  Because  of  the  ratio  bias,  the  cov¬ 
erage  for  a  j(P,  c)  decreases  from  around  0.90  to  less  than  0.20  as  P  increases  from  8  to  512  for 
both  initial  conditions.  Significant  coverage  loss  begins  to  be  observed  in  the  range  from  P  =  32 
to  P  —  64.  On  the  other  hand,  the  coverage  for  aR(P,  c )  starts  out  slightly  below  0.90  for  P  =  8 
and  then  rapidly  approaches  0.90  as  P  increases.  The  low  coverage  when  P  -  8  is  due  both  to  a 
less  robust  variance  estimate  as  well  as  to  the  use  of  a  normal  multiplier,  rather  than  a 
t-multiplier,  in  the  confidence  interval.  For  example,  when  Q( 0)  =  0  and  a  t-multiplier  with  7 
degrees  of  freedom  is  used  instead  of  the  normal  multiplier,  the  coverage  for  as(8,P)  increases 
from  0.820  to  0.864. 

Figures  6  and  7  display  results  of  simulating  the  open  central  server  model.  This  network 
was  simulated  for  c  =  2500  events.  Figure  6  plots  a^P.c )  and  aR(P,c)  as  a  function  of  k(c)  for 
P  =  8,  64  and  512.  (Because  of  the  organization  of  the  simulator's  data  collection  facilities, 
aj{P,c)  is  independent  of  P  )  Initialization  bias  is  essentially  eliminated  by  k(c)  =  1000,  but  signif¬ 
icant  ratio  bias  is  still  evident  in  a  j(P,c)-  Note  also  that  there  are  only  slight  differences  between 
«/j(8,c),  5^(64, c)  and  5ft(512,c).  Because  of  the  initialization  bias,  without  deletion,  the  coverage 
for  both  a  j(P,  c)  and  aR(P,c)  are  well  below  the  0.90  level:  the  coverage  for  a/{64,c)  is  0.627 
while  the  coverage  for  aR(64 ,c)  is  0.457.  Note  that  when  k(c)  =  250,  a^P,  c)  is,  by  chance,  almost 
unbiased.  Thus,  with  this  amount  of  deletion,  the  coverage  for  a^P,  c)  will  be  (approximately) 
correct,  but  for  the  wrong  reason.  For  example,  aj(5\2x)  has  coverage  0.91  while  aR(512,c)  has 
coverage  of  only  0.60. 

Figure  7  plots  the  coverages  for  aj{P,  c )  and  a^(P,  c)  as  a  function  of  P  when  initialization 
bias  is  essentially  removed  (k(c)  =  1000).  Again,  the  coverage  for  a j{P,  c)  decreases  as  P  increases, 
while  the  coverage  for  clr(P,  c)  increases  to  and  then  remains  at  or  near  the  nominal  0.90  level. 

Figure  8  displays  a  similar  pattern  for  simulations  of  the  closed  central  server  model.  This 
figure  plots  coverage  results  when  c  =  2000  and  k(c)  =  500.  With  these  parameters,  initialization 
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bias  is  removed  but  ratio  bias  is  still  present.  The  steady-state  value  being  estimated  is  a  =  3  057, 
and  a*(512,c)  =  3.056  (S*(512,c)  =  0.002)  while  5^512, e)  =  3.111  (5^512, c)  =  0.002). 

As  has  been  indicated  several  times  above,  for  given  values  of  P,  c  and  k(c),  the  values  of 
Sj{P,c)  and  SR(P,c)  have  been  very  nearly  the  same.  This  has  been  observed  throughout  our 
experiments.  This  is  explained  by  the  fact  that,  even  with  ratio  bias  still  present,  aj{P,  c)  and 
aR(P,  c)  both  obey  central  limit  theorems  with  the  same  asymptotic  standard  deviation  (see  Theo¬ 
rems  3  and  4). 

6.  Summary  and  Conclusions 

This  paper  has  considered  the  problem  of  estimating  steady-state  parameters  on  multiple 
processors  via  the  method  of  parallel  replications.  While  the  method  is  conceptually  straightfor¬ 
ward  to  apply,  statistical  considerations  point  to  the  need  for  using  an  alternative  steady-state  esti¬ 
mation  procedure.  This  need  arises  because  the  traditional  estimator,  <*j{P,  c),  contains  two 
sources  of  bias  having  the  same  order  of  magnitude:  initialization  bias  and  ratio  bias.  While 
appropriate  deletion  of  an  initial  portion  of  each  simulation  effectively  removes  initialization  bias, 
it  does  not  affect  the  ratio  bias.  When  using  a  large  number  processors,  this  residual  ratio  bias 
results  in  a  biased  estimate  and  corresponding  loss  in  confidence  interval  coverage. 

The  alternative  estimator,  aR(P ,  c),  corresponds  to  the  classical  ratio  estimator  that  is  com¬ 
monly  used  in  regenerative  simulation.  Its  ratio  bias  is  order  P  times  smaller  than  its  initialization 
bias.  Thus  appropriate  deletion  is  effective  in  removing  the  major  source  of  bias.  The  net  effect 
is  that  by  using  aR{P,  c)  rather  than  aj(P,  c)  allows  one  to  either: 

1 .  use  many  more  processors  for  a  given  amount  of  computing  time  per  processor,  or 

2.  make  shorter  runs  for  a  given  number  of  processors. 

This  paper  examined  these  issues  empirically  via  simulations  of  a  variety  of  queueing 
systems.  Our  experiments  confirm  the  theoretical  results,  and  indicate  that  the  ratio  bias  can 
become  a  problem  even  on  moderately  sized  parallel  processors  with,  say,  32  to  64  processors. 

The  results  of  this  paper  also  have  some  applicability  to  the  standard  single  processor 
method  of  independent  replications.  In  this  method,  the  replication  length  is  often  determined  by 
either  the  total  number  of  events,  a  simulated  time  limit,  a  computer  time  limit,  or  the  number  of 
events  of  a  particular  type  such  as  the  number  of  departures  from  a  queue.  (Sometimes  a  combi¬ 
nation  of  these  limits  is  used.)  When  estimating  many  parameters  in  a  queueing  network,  there 
will  always  be  some  parameters  that  are  estimated  on  a  different  time  scale  than  that  used  to 
determine  the  replication  length.  Thus  the  denominator  of  some  parameter  estimates  will  be 
random,  resulting  in  ratio  bias.  For  example,  if  simulated  time  is  used  to  control  the  replication 
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length,  then  response  time  estimates  will  have  a  random  denominator  (the  number  of  customers 
departing  from  the  queue).  On  the  other  hand,  if  an  event  count  is  used  to  control  the  replication 
length,  then  queue  length  estimates  will  have  a  random  denominator  (the  simulated  time).  Thus 
ratio  bias  could  be  a  concern,  even  on  a  single  processor.  However,  there  is  usually  little  moti¬ 
vation  to  run  a  very  large  number  of  short  replications  on  a  single  processor,  since  either  batch 
means  or  a  running  few  long  replications  will  be  less  sensitive  to  initialization  bias.  Never-the- 
less,  the  issue  of  ratio  bias  should  be  kept  in  mind.  In  fact,  for  a  small  number  of  replications,  the 
ratio  form  of  aR(P,c)  suggests  the  use  of  jackknifing  (see  Miller  (1974))  for  both  (ratio)  bias 
reduction  and  for  robust  variance  estimation.  However,  the  properties  and  validity  of  jackknifing 
in  this  situation  have  not  yet  been  established,  and  remain  as  open  problems  for  research. 

In  addition,  if  the  replication  length  is  determined  within  a  sequential  procedure  (see,  e.g., 
Law  and  Kelton  (1982)),  then  the  denominator  of  the  estimates  will  typically  be  random  resulting 
in  possible  ratio  bias.  This  will  also  be  true  if  the  length  of  the  truncation  interval  is  determined 
by  statistical  tests  of  the  simulation  output  (see,  e  g.,  Schruben  (1982)).  The  effect  of  ratio  bias  in 
these  situations  also  has  yet  to  be  analyzed 
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Table  1 


Predicted  and  Actual  90%  Confidence  Interval  Coverages 
in  M/M/1  Queue  Waiting  Time  Simulations  with  c  =  1000 
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Abstract:  A  simple  and  effective  way  to  exploit  parallel  processors  in  discrete  event  simu¬ 

lations  is  to  run  multiple  independent  replications,  in  parallel,  on  multiple  processors  and  to 
average  the  results  at  the  end  of  the  runs  We  call  this  the  method  of  parallel  replications 
This  paper  is  concerned  with  using  the  method  of  parallel  replications  for  estimating  steady- 
state  performance  measures.  We  report  on  the  results  of  queueing  network  simulation 
experiments  that  compare  the  statistical  properties  of  several  possible  estimators  that  can 
be  formed  using  this  method  The  theoretical  asymptotic  properties  of  these  estimators 
were  determined  in  Glynn  and  Heidelberger  (1989a  and  1989b)  Both  the  theory  and  the 
experimental  results  reported  here  strongly  indicate  that  a  nonstandard  (in  the  context  of 
steady-state  simulation),  yet  easy  to  apply,  estimation  procedure  is  required  on  highly  par¬ 
allel  machines  This  nonstandard  estimator  is  a  ratio  estimator.  The  experiments  also  show 
that  use  of  the  ratio  estimator  is  advantageous  even  on  machines  with  only  a  moderate 
degree  of  parallelism 
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